IPy Parallel and R

In this notebook, we'll use IPython.parallel (IPP) and rpy2 as a quick-and-dirty way of parallelizing work in R. We'll use a cluster of IPP engines running on the same VM as the notebook server to demonstarate. We'll also need to install rpy2 before we can start.

!pip install rpy2

Start Local IPP Engines

First we must start a cluster of IPP engines. We can do this using the Cluster tab of the Jupyter dashboard. Or we can do it programmatically in the notebook.



In [20]:

    
from IPython.html.services.clusters.clustermanager import ClusterManager



In [21]:

    
cm = ClusterManager()

We have to list the profiles before we can start anything, even if we know the profile name.



In [60]:

    
cm.list_profiles()









    Out[60]:





[{'profile': u'default',
  'profile_dir': u'/home/notebook/.ipython/profile_default',
  'status': 'stopped'},
 {'profile': u'remote',
  'profile_dir': u'/home/notebook/.ipython/profile_remote',
  'status': 'stopped'}]

For our demo purposes, we'll just use the default profile which starts a cluster on the local machine for us.



In [61]:

    
cm.start_cluster('default')









    Out[61]:





{'n': 8,
 'profile': 'default',
 'profile_dir': u'/home/notebook/.ipython/profile_default',
 'status': 'running'}

After running the command above, we need to pause for a few moments to let all the workers come up. (Breathe and count 10 ... 9 ... 8 ...)

Now we can continue to create a DirectView that can talk to all of the workers. (If you get an error, breathe, count so more, and try again in a few.)



In [27]:

    
import IPython.parallel



In [62]:

    
client = IPython.parallel.Client()



In [63]:

    
dv = client[:]

In my case, I have 8 CPUs so I get 8 workers by default. Your number will likely differ.



In [72]:

    
len(dv)









    Out[72]:





8

To ensure the workers are functioning, we can ask each one to run the bash command echo $$ to print a PID.



In [64]:

    
%%px
!echo $$









    



[stdout:0] 12973
[stdout:1] 12974
[stdout:2] 12978
[stdout:3] 12980
[stdout:4] 12977
[stdout:5] 12975
[stdout:6] 12976
[stdout:7] 12979

Use R on the Engines

Next, we'll tell each engine to load the rpy2.ipython extension. In our local cluster, this step is easy because all of the workers are running in the same environment as the notebook server. If the engines were remote, we'd have many more installs to do.



In [ ]:

    
%%px
%load_ext rpy2.ipython

Now we can tell every engine to run R code using the %%R (or %R) magic. Let's sample 50 random numbers from a normal distribution.



In [77]:

    
%%px
%%R 
x <- rnorm(50)
summary(x)









    





[output:0]






    





    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-2.03500 -0.77970  0.05572 -0.07040  0.49570  2.31900 







    





[output:1]






    





    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-2.00000 -0.74080  0.02746 -0.02496  0.53140  2.38700 







    





[output:2]






    





    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-1.96800 -0.80070  0.07342 -0.05425  0.61380  2.01700 







    





[output:3]






    





    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-2.81100 -0.44310  0.01515 -0.04424  0.49470  1.90200 







    





[output:4]






    





    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-1.45400 -0.36850  0.04397  0.04665  0.42520  1.43300 







    





[output:5]






    





    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-2.32200 -0.75890  0.01176  0.05932  0.87360  2.35100 







    





[output:6]






    





   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-1.8590 -0.2639  0.1336  0.1777  0.8915  3.2200 







    





[output:7]






    





   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-3.5150 -0.9433 -0.1412 -0.2161  0.5414  2.4960

Pull it Back to Python

With our hack, we can't simply pull the R vectors back to the local notebook. (IPP can't pickle them.) But we can convert them to Python and pull the resulting objects back.



In [78]:

    
%%px
%Rpull x
x = list(x)



In [79]:

    
x = dv.gather('x', block=True)

We should get 50 elements per engine.



In [80]:

    
assert len(x) == 50 * len(dv)

Clean Up the Engines

When we're done, we can clean up any engines started using the code at the top of this notebook with the following call.



In [81]:

    
cm.stop_cluster('default')









    Out[81]:





{'profile': u'default',
 'profile_dir': u'/home/notebook/.ipython/profile_default',
 'status': 'stopped'}

This notebook was created using IBM Knowledge Anyhow Workbench. To learn more, visit us at https://knowledgeanyhow.org.